worst-case performance
Details
The training is stalled if the size of the replay buffer is smaller than the minibatch size, i.e., if |B|< M. Algorithms 3 and 4 show the critic network update and the actor network and uncertainty parameter sampler update, respectively. Although we write the gradient-based update in the form of a mini-batch stochastic gradient update for simplicity, we employ an adaptive approach such as Adam [16]. The update of pk follows the exponential moving average with the momentum (1/Tlast), where Tlast is the number of steps spent in the last episode (Tlast is set to 1000 for the first episode). The reason behind this design choice is as follows. The short episode is a meaning that a bad uncertainty parameter ω is used in the last episode.
A Proof Proof of Proposition 4.2 Proposition 4.2 The performance gap of evaluating policy profile (π, µ) and (π, π
Proof of Theorem 4.7 We first prove a Lemma. Theorem A.2. (Theorem 1 in [36]) Let ϵ = max Theorem 4.7 In a two-player game, suppose that According to Theorem A.2, we have J ( π, µ) J ( π, α) E CQL [20] puts regularization on the learning of Q function to penalize out-of-distribution actions. The CSP algorithm is illustrated in Algorithm 1. The proxy model is trained adversarially against our agent, therefore, we set the proxy's reward function to be the negative of our agent's reward. We show experiment details of the Maze example in this section.
Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its slow preprocessing'' version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded intrinsic'' dimension. For the other data structure variants studied, including DiskANN with fast preprocessing'', HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a reasonable'' accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least $0.1 n$ steps on instances of size $n$ before it encounters any of the $5$ nearest neighbors of the query.